IMPORTANT NOTES

  • Necessary to avoid R variable names like variable.name since . in python is somewhat analogous to $ in R.

  • Follow the instructions in the stitches-in-R-setup R markdown so that this notebook will work. If you’ve followed that markdown, stitches should be installed in the r-reticulate virtual environment for this notebook and should be callable.

  • there’s some redundancies on my python imports, I should put in one place and clean up at some point.

  • gridded stitching does slow things down. This notebook takes maybe 15 seconds to knit when there’s no gridded calculations. Adding them increases the knit time to maybe like 5 minutes.

  • can add eval=FALSE for those blocks so people can get a first knit notebook quickly (all towards end).

Setup - reticulate

library(reticulate)
## Warning: package 'reticulate' was built under R version 3.5.2
# # indicate that we want to use a specific virtualenv
use_virtualenv("r-reticulate", required =TRUE)

quick reticulate test

quick block of code that will only successfully knit if the setup was done correctly

import numpy as np
print(np.max([2,3]))
## 3

make sure can import stitches and call a function from it

This is a good function to test a stitches installation with - no arguments needed, just a list of data on pangeo, implicitly tests that can connect to pangeo.

import pandas as pd
pd.set_option('display.max_columns', None)

import stitches
pangeo_table = stitches.fx_pangeo.fetch_pangeo_table()
print(pangeo_table.head())
##   activity_id institution_id     source_id       experiment_id member_id  \
## 0  HighResMIP           CMCC  CMCC-CM2-HR4  highresSST-present  r1i1p1f1   
## 1  HighResMIP           CMCC  CMCC-CM2-HR4  highresSST-present  r1i1p1f1   
## 2  HighResMIP           CMCC  CMCC-CM2-HR4  highresSST-present  r1i1p1f1   
## 3  HighResMIP           CMCC  CMCC-CM2-HR4  highresSST-present  r1i1p1f1   
## 4  HighResMIP           CMCC  CMCC-CM2-HR4  highresSST-present  r1i1p1f1   
## 
##   table_id variable_id grid_label  \
## 0     Amon          ps         gn   
## 1     Amon        rsds         gn   
## 2     Amon        rlus         gn   
## 3     Amon        rlds         gn   
## 4     Amon         psl         gn   
## 
##                                               zstore  dcpp_init_year   version  
## 0  gs://cmip6/CMIP6/HighResMIP/CMCC/CMCC-CM2-HR4/...             NaN  20170706  
## 1  gs://cmip6/CMIP6/HighResMIP/CMCC/CMCC-CM2-HR4/...             NaN  20170706  
## 2  gs://cmip6/CMIP6/HighResMIP/CMCC/CMCC-CM2-HR4/...             NaN  20170706  
## 3  gs://cmip6/CMIP6/HighResMIP/CMCC/CMCC-CM2-HR4/...             NaN  20170706  
## 4  gs://cmip6/CMIP6/HighResMIP/CMCC/CMCC-CM2-HR4/...             NaN  20170706

Setup - R packages

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr)
library(ggplot2)

Load Hector data

pulled a quick rcp4.5 from HectorUI

hector_tgav <- read.csv('data/Hector-data-2022-09-22.csv', skip=3, stringsAsFactors = FALSE)
hector_exp <- unique(hector_tgav$scenario)
head(hector_tgav)
##                scenario year variable     value units
## 1 RCP  Standard-RCP-4.5 1800     Tgav 0.1176974  degC
## 2 RCP  Standard-RCP-4.5 1801     Tgav 0.1212491  degC
## 3 RCP  Standard-RCP-4.5 1802     Tgav 0.1243894  degC
## 4 RCP  Standard-RCP-4.5 1803     Tgav 0.1272345  degC
## 5 RCP  Standard-RCP-4.5 1804     Tgav 0.1298589  degC
## 6 RCP  Standard-RCP-4.5 1805     Tgav 0.1323124  degC

Make sure we can work with it from a python block

This is also a quick back and forth of moving data between R and python blocks following:

import numpy as np
import pandas as pd

tgav_df = pd.DataFrame(r.hector_tgav)
print(np.max(tgav_df[['value']]))
## value    2.552422
## dtype: float64
## 
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/numpy/core/fromnumeric.py:84: FutureWarning: In a future version, DataFrame.max(axis=None) will return a scalar max over the entire DataFrame. To retain the old behavior, use 'frame.max(axis=0)' or just 'frame.max()'
##   return reduction(axis=axis, out=out, **passkwargs)

and we can compare to the R evaluation

# The hector data we read straight in to R
print(max(hector_tgav$value))
## [1] 2.552422
# R operations on the python data
print(max(py$tgav_df$value))
## [1] 2.552422

Reshape the Hector data to be stitches target

First we reshape the Hector time series to match the CMIP6 ESM GSAT smooth anomaly time series that STITCHES operates on: NOTE we should probably add this as a function to stitches NOTE Hector time series start in 1800 - there’s no science reason we couldn’t do matching over that window and have a gridded netcdf of an extra 50 years of history.

hector_tgav %>%
    mutate(variable = 'tas',
           ensemble = 'hectorUI1',# doesn't affect the matching or stitching
           model = 'Hector') %>%
    rename(experiment = scenario) %>% 
    filter(year >= 1850) %>%
    select(variable, experiment, ensemble, model, year, value,  -units) ->
    target_tgav



print(head(target_tgav))
##   variable            experiment  ensemble  model year      value
## 1      tas RCP  Standard-RCP-4.5 hectorUI1 Hector 1850 0.08854785
## 2      tas RCP  Standard-RCP-4.5 hectorUI1 Hector 1851 0.09338570
## 3      tas RCP  Standard-RCP-4.5 hectorUI1 Hector 1852 0.09959573
## 4      tas RCP  Standard-RCP-4.5 hectorUI1 Hector 1853 0.10672673
## 5      tas RCP  Standard-RCP-4.5 hectorUI1 Hector 1854 0.11386913
## 6      tas RCP  Standard-RCP-4.5 hectorUI1 Hector 1855 0.10890346

Convert Hector Tgav to stitches windows

Now that we have shaped it as stitches is expecting, we can use stitches functions to calculate the matching windows for this target data

import stitches 
import pandas as pd
import pkg_resources
import xarray as xr
import numpy as np
pd.set_option('display.max_columns', None)


target_data = stitches.fx_processing.get_chunk_info(
    stitches.fx_processing.chunk_ts(df = r.target_tgav, n=9))
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:121: FutureWarning: Passing a set as an indexer is deprecated and will raise in a future version. Use a list instead.
##   extra_info = df[extra_columns].drop_duplicates()
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
##   fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
##   fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
##   fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
##   fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
##   fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
##   fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
##   fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
##   fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
##   fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
##   fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
##   fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
##   fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
##   fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
##   fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
##   fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
##   fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
##   fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
##   fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
##   fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
##   fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
##   fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
##   fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
##   fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
##   fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
##   fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
##   fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
##   fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
##   fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:171: SettingWithCopyWarning: 
## A value is trying to be set on a copy of a slice from a DataFrame.
## Try using .loc[row_indexer,col_indexer] = value instead
## 
## See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
##   extra_info["i"] = 1
print(target_data.head())
##     ensemble             experiment variable   model  start_yr  end_yr  year  \
## 0  hectorUI1  RCP  Standard-RCP-4.5      tas  Hector      1850    1858  1854   
## 1  hectorUI1  RCP  Standard-RCP-4.5      tas  Hector      1859    1867  1863   
## 2  hectorUI1  RCP  Standard-RCP-4.5      tas  Hector      1868    1876  1872   
## 3  hectorUI1  RCP  Standard-RCP-4.5      tas  Hector      1877    1885  1881   
## 4  hectorUI1  RCP  Standard-RCP-4.5      tas  Hector      1886    1894  1890   
## 
##          fx        dx  
## 0  0.113869 -0.011713  
## 1  0.067891  0.011509  
## 2  0.141379  0.002686  
## 3  0.162569 -0.029602  
## 4 -0.045458  0.014252

Load the stitches archive data for matching

variables and experiments of interest

We want to make sure that when we match to a point in an archive, that point has every netcdfs logged for every variable we care about.

esm = ['CanESM5']
vars1 = ['tas', 'pr', 'hurs', 'psl']
exps = ['ssp126', 'ssp245', 'ssp370', 'ssp460', 'ssp585']
# pangeo table of ESMs for reference
pangeo_path = pkg_resources.resource_filename('stitches', 'data/pangeo_table.csv')
pangeo_data = pd.read_csv(pangeo_path)
# print(np.sort(pangeo_data.variable.unique()))
pangeo_data = pangeo_data[((
                               (pangeo_data['variable'].isin(vars1)) )  )
                          & ((pangeo_data['domain'].str.contains('mon')) ) &
                           ((pangeo_data['experiment'].isin(exps))) &
                           (pangeo_data['model'].isin(esm))].copy()



pangeo_good_ensembles =[]
for name, group in pangeo_data.groupby(['model', 'experiment', 'ensemble']):
    df = group.drop_duplicates().copy()
    if len(df) >= len(vars1):
        pangeo_good_ensembles.append(df)
    del(df)
pangeo_good_ensembles1 = pd.concat(pangeo_good_ensembles)
pangeo_good_ensembles2  = pangeo_good_ensembles1[['model', 'experiment', 'ensemble']].drop_duplicates().copy()
pangeo_good_ensembles = pangeo_good_ensembles2.reset_index(drop=True).copy()
del(pangeo_good_ensembles1)
del(pangeo_good_ensembles2)

subset archive appropriately

path = pkg_resources.resource_filename('stitches', 'data/matching_archive.csv')
archive_data = pd.read_csv(path)


# Keep only the entries that appeared in pangeo_good_ensembles:
keys =['model', 'experiment', 'ensemble']
i1 = archive_data.set_index(keys).index
i2 = pangeo_good_ensembles.set_index(keys).index
archive_data= archive_data[i1.isin(i2)].copy()
del(i1)
del(i2)

# # Don't keep archive points with a base year of every year, just the defaults and
# # the midpont
# # do the subset to just core years + 1/2 window offset years.
# years = np.concatenate((1854+9*np.array(list( range(0,28))), 1859+9*np.array(list( range(0,27)))))
# archive_data = archive_data[archive_data['year'].isin(years)].reset_index(drop=True).copy()

print(archive_data.head())
##        ensemble variable    model experiment  start_yr  end_yr  year  \
## 6996  r10i1p1f1      tas  CanESM5     ssp126      1850    1858  1854   
## 6997  r10i1p1f1      tas  CanESM5     ssp126      1859    1867  1863   
## 6998  r10i1p1f1      tas  CanESM5     ssp126      1868    1876  1872   
## 6999  r10i1p1f1      tas  CanESM5     ssp126      1877    1885  1881   
## 7000  r10i1p1f1      tas  CanESM5     ssp126      1886    1894  1890   
## 
##             fx        dx  
## 6996 -1.360399  0.016472  
## 6997 -1.297474 -0.001508  
## 6998 -1.200856  0.010901  
## 6999 -1.218503 -0.016847  
## 7000 -1.285527  0.006000

Match


my_recipes = stitches.make_recipe(target_data, archive_data,
                                  non_tas_variables=['pr', 'psl', 'hurs'],
                                  tol = 0.13, N_matches = 10000)
## You have requested more recipes than possible for at least one target trajectories, returning what can
print(my_recipes.head())
##   target_start_yr target_end_yr archive_experiment archive_variable  \
## 0            1850          1858         historical              tas   
## 1            1859          1867         historical              tas   
## 2            1868          1876         historical              tas   
## 3            1877          1885         historical              tas   
## 4            1886          1894         historical              tas   
## 
##   archive_model archive_ensemble                       stitching_id  \
## 0       CanESM5        r14i1p1f1  RCP  Standard-RCP-4.5~hectorUI1~1   
## 1       CanESM5        r24i1p1f1  RCP  Standard-RCP-4.5~hectorUI1~1   
## 2       CanESM5        r24i1p1f1  RCP  Standard-RCP-4.5~hectorUI1~1   
## 3       CanESM5        r20i1p1f1  RCP  Standard-RCP-4.5~hectorUI1~1   
## 4       CanESM5        r24i1p1f1  RCP  Standard-RCP-4.5~hectorUI1~1   
## 
##   archive_start_yr archive_end_yr  \
## 0             2003           2011   
## 1             2003           2011   
## 2             2003           2011   
## 3             2003           2011   
## 4             2003           2011   
## 
##                                             tas_file  \
## 0  gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical...   
## 1  gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical...   
## 2  gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical...   
## 3  gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical...   
## 4  gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical...   
## 
##                                            hurs_file  \
## 0  gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical...   
## 1  gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical...   
## 2  gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical...   
## 3  gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical...   
## 4  gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical...   
## 
##                                              pr_file  \
## 0  gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical...   
## 1  gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical...   
## 2  gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical...   
## 3  gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical...   
## 4  gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical...   
## 
##                                             psl_file  
## 0  gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical...  
## 1  gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical...  
## 2  gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical...  
## 3  gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical...  
## 4  gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical...

Unless otherwise specified by the argument reproducible = True, the draws stitches takes from the pool of matches is stochastic. Do another draw to show off:


my_recipes2 = stitches.make_recipe(target_data, archive_data,
                                  non_tas_variables=['pr', 'psl', 'hurs'],
                                  tol = 0.13, N_matches = 10000)
## You have requested more recipes than possible for at least one target trajectories, returning what can

Within each of these ensembles, there’s no envelope collapse. If we were to concatenate them together into a super-ensemble, there would be. So it’s not as powerful as a very large ensemble of collapse free realizations but it’s still a distinct ensemble to consider.

stitch Tgav

stitched_global_temp = stitches.gmat_stitching(my_recipes)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_stitch.py:344: FutureWarning: In a future version of pandas, a length 1 tuple will be returned when iterating over a groupby with a grouper equal to a list of length 1. Don't supply a list with a single grouper to avoid this warning.
##   for name, match in rp.groupby(['stitching_id']):
print(stitched_global_temp.head())
##    year     value variable                       stitching_id
## 0  1850  0.189172      tas  RCP  Standard-RCP-4.5~hectorUI1~1
## 1  1851  0.302949      tas  RCP  Standard-RCP-4.5~hectorUI1~1
## 2  1852  0.170092      tas  RCP  Standard-RCP-4.5~hectorUI1~1
## 3  1853  0.091319      tas  RCP  Standard-RCP-4.5~hectorUI1~1
## 4  1854  0.129352      tas  RCP  Standard-RCP-4.5~hectorUI1~1
stitched_global_temp2 = stitches.gmat_stitching(my_recipes2)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_stitch.py:344: FutureWarning: In a future version of pandas, a length 1 tuple will be returned when iterating over a groupby with a grouper equal to a list of length 1. Don't supply a list with a single grouper to avoid this warning.
##   for name, match in rp.groupby(['stitching_id']):

plot Tgav

if you’re really pressed for computing time, you can pair this with pattern scaling and get the mean spatial field. But we can do internal variability at the gridded multivariate scale, so we do.

stitch gridded

Using the above recipes, stitches can produce new gridded netcdf files for each recipe, for multiple variables. Creating multiple netcdfs of monthly data is slower than focusing on GSAT. Still much faster than an ESM, and once the recipes are set, very parallelizable. But for tractability, we are just going to stitch the gridded netcdfs for one ensemble member.

# Just do one realization
recipe = my_recipes[my_recipes['stitching_id'] == 'RCP  Standard-RCP-4.5~hectorUI1~1'].reset_index(drop=True).copy()

We must specify a directory for the netcdfs to be saved to. We will use simply do it directly here. On my computer, constructing these 4 netcdfs took ~2-3 minutes.

stitches.gridded_stitching(out_dir='data', rp =recipe)
## ['Stitching gridded netcdf for: CanESM5 tas RCP  Standard-RCP-4.5~hectorUI1~1']
## ['data/stitched_CanESM5_tas_RCP  Standard-RCP-4.5~hectorUI1~1.nc', 'data/stitched_CanESM5_hurs_RCP  Standard-RCP-4.5~hectorUI1~1.nc', 'data/stitched_CanESM5_pr_RCP  Standard-RCP-4.5~hectorUI1~1.nc', 'data/stitched_CanESM5_psl_RCP  Standard-RCP-4.5~hectorUI1~1.nc']

plot gridded

  • we will do maps of all 4 variables at a specified month and year
  • we will do a time series of each variable in a specific grid cell
# pull all netcdfs:
files <- list.files('data', pattern  = '.nc', full.names = TRUE)

#subset to just the ESM of interest:
files <- files[grepl(py$esm, files)]

# subset to just the Hector scenario read in above:
files <- files[grepl(hector_exp, files)]

print(files)
## [1] "data/stitched_CanESM5_hurs_RCP  Standard-RCP-4.5~hectorUI1~1.nc"
## [2] "data/stitched_CanESM5_pr_RCP  Standard-RCP-4.5~hectorUI1~1.nc"  
## [3] "data/stitched_CanESM5_psl_RCP  Standard-RCP-4.5~hectorUI1~1.nc" 
## [4] "data/stitched_CanESM5_tas_RCP  Standard-RCP-4.5~hectorUI1~1.nc"
map_time <- '2100-12-31'
  • I like python xarray functions for reading in and plotting netcdf data better, but here is some code in R that twill result in a long data frame for each variable. Lot slower than python functions so set to eval=FALSE

r netcdf read in for r plotting

library(netcdf4)
# function to convert dates
  ##TODO you'll have to adjust this depending on your calendar and units in climate data:
  # days since 1850-01-01 00:00:00, 365 day year
  convert_time <- function(time, reference_year, nc_start_year){
    time %>%
      dplyr::mutate(time_index = as.integer(row.names(.)) - 1 + 12*(nc_start_year - reference_year),
             month = floor(time_index %% 12) + 1,
             year = floor(time_index/ 12) + reference_year,
             time_id = paste0(month, '~', year )) %>%
      dplyr::select(time, time_id)
  }

# get the data read in and reshaped
for (v in py$vars1){
  nc_name <- files[grepl(v, files)]
  print(nc_name)
  
  # open file
  ncfile1 <- ncdf4::nc_open(nc_name)
  
  # pull off lon/lat/time info
  lat1 <- ncdf4::ncvar_get(ncfile1,"lat")
  lon1 <- ncdf4::ncvar_get(ncfile1,"lon")
  time1 <- ncdf4::ncvar_get(ncfile1, "time") # units: days since 1850-01-31 00:00:00
  
  # get the var data as a [nlat*nlon, ntime] unlabled matrix
  var_data <- t(rbind(matrix(aperm(ncdf4::ncvar_get(ncfile1,v),c(3,1,2)),length(time1),length(lat1)*length(lon1))))
  ncdf4:nc_close()
  
  # add time labels
  colnames(var_data) <- convert_time(time = data.frame(time=time1),
                                 reference_year = 1850,
                                 nc_start_year = 1850)$time_id
  
  # add lat/lon labels and reshape to long format
  grid <- expand.grid(list(lon=lon1,lat=lat1))
  assign(paste0(v, 'data'),
          cbind(grid, var_data) %>%
      tidyr::gather(time, value, -lon, -lat) %>%
      tidyr::separate(time, into = c('month', 'year'), sep = '~') %>%
      dplyr::mutate(year = as.integer(year),
             month=as.integer(month)))
  rm(var_data)
  rm(time1)
  rm(lon1)
  rm(lat1)
  rm(grid)
}

python maps and time series

It’s a lot easier to use python to load in the netcdfs and select a grid cell or a time slice, save those off as simpler data frames than a full netcdf, and plot in R.’

tas

TODO just make it a function

py_files = pd.DataFrame(r.files, columns = ['file_name'])

# load the netcdf:
f = py_files.loc[py_files['file_name'].str.contains('tas')].values[0]
tas_nc = xr.open_dataset(f[0])

# time slice map:
tas_time_slice = tas_nc.sel(time = r.map_time).copy()


# time series for single grid cell:
# lon and lat values for a grid cell near College Park, MD, home of JGCRI:
cp_lat = 38.9897
cp_lon = 180 + 76.9378 # this is probably not actually right 

# lat and lon coordinates closest
abslat = np.abs(tas_nc.lat - cp_lat)
abslon = np.abs(tas_nc.lon-cp_lon)
c = np.maximum(abslon, abslat)
([lon_loc], [lat_loc]) = np.where(c == np.min(c))
lon_grid = tas_nc.lon[lon_loc]
lat_grid = tas_nc.lat[lat_loc]

tas_grid_slice = tas_nc.sel(lon = lon_grid, lat=lat_grid,
                         time = slice('1850-01-01', '2099-12-31')).copy()
del(tas_nc)

Take a look at this data in R:

# map data:
print(py$tas_time_slice$tas)
## <xarray.DataArray 'tas' (lat: 64, lon: 128)>
## array([[254.8431 , 254.7373 , 254.57207, ..., 255.20908, 255.08337, 254.97351],
##        [255.91133, 255.52061, 255.24324, ..., 257.03964, 256.59912, 256.26254],
##        [255.315  , 254.83546, 254.41234, ..., 256.6023 , 256.1585 , 255.75359],
##        ...,
##        [271.74088, 272.53   , 273.20392, ..., 267.6897 , 269.28156, 270.6536 ],
##        [271.0604 , 271.45453, 271.7761 , ..., 269.08386, 269.78717, 270.53555],
##        [270.7581 , 270.85852, 270.92258, ..., 270.49863, 270.57562, 270.64154]],
##       dtype=float32)
## Coordinates:
##     time     datetime64[ns] 2100-12-31
##   * lat      (lat) float64 -87.86 -85.1 -82.31 -79.53 ... 79.53 82.31 85.1 87.86
##   * lon      (lon) float64 0.0 2.812 5.625 8.438 ... 348.8 351.6 354.4 357.2
## Attributes:
##     units:            K
##     variable:         tas
##     experiment:       historical
##     ensemble:         r14i1p1f1
##     model:            CanESM5
##     stitching_id:     RCP  Standard-RCP-4.5~hectorUI1~1
##     recipe_location:  data/stitched_CanESM5_tas_RCP  Standard-RCP-4.5~hectorU...
tas_val <- py$tas_time_slice$tas$values
tas_lon <- py$tas_time_slice$lon$values
tas_lat <- py$tas_time_slice$lat$values
grid <- expand.grid(list(lon=tas_lon,lat=tas_lat))
tas <- cbind(grid, 
             value=t(matrix(aperm(tas_val,c(2,1)), 1,length(tas_lat)*length(tas_lon))))

as.character(py$tas_time_slice$time) %>%
  substr(.,  37, 43) ->
  tas_time
  
ggplot(tas, aes(x = lon, y = lat, color=value, fill = value)) + geom_tile() +
  ggtitle(paste(tas_time, 'tas (K)'))

# grid time series data:
print(py$tas_grid_slice$tas)
## <xarray.DataArray 'tas' (time: 3000)>
## array([269.74368, 273.0087 , 274.19226, ..., 291.71487, 278.65762, 273.592  ],
##       dtype=float32)
## Coordinates:
##   * time     (time) datetime64[ns] 1850-01-31 1850-02-28 ... 2099-12-31
##     lat      float64 37.67
##     lon      float64 255.9
## Attributes:
##     units:            K
##     variable:         tas
##     experiment:       historical
##     ensemble:         r14i1p1f1
##     model:            CanESM5
##     stitching_id:     RCP  Standard-RCP-4.5~hectorUI1~1
##     recipe_location:  data/stitched_CanESM5_tas_RCP  Standard-RCP-4.5~hectorU...
cp_tas <- data.frame(value = py$tas_grid_slice$tas$values)

data.frame(time = as.character(py$tas_grid_slice$time$values)) %>%
  separate(time, into = c('year', 'month', 'day'), sep = '-') %>%
  select(-day) %>%
  mutate(year = as.integer(year),
         month = as.integer(month),
         time_index = as.integer(row.names(.))) ->
  cp_time

ggplot(cbind(cp_time, cp_tas), aes(x = time_index, y = value)) + geom_line() +
  ggtitle('College Park monthly tas, 1850-2100 (K)') 

ggplot(cbind(cp_time, cp_tas) %>% filter(year >= 2021), aes(x = time_index, y = value)) + geom_line() +
  ggtitle('College Park monthly tas, 2021-2100 (K)') 

pr

py_files = pd.DataFrame(r.files, columns = ['file_name'])

# load the netcdf:
f = py_files.loc[py_files['file_name'].str.contains('pr')].values[0]
pr_nc = xr.open_dataset(f[0])

# time slice map:
pr_time_slice = pr_nc.sel(time = r.map_time).copy()


# time series for single grid cell:
# lon and lat values for a grid cell near College Park, MD, home of JGCRI:
cp_lat = 38.9897
cp_lon = 180 + 76.9378 # this is probably not actually right 

# lat and lon coordinates closest
abslat = np.abs(pr_nc.lat - cp_lat)
abslon = np.abs(pr_nc.lon-cp_lon)
c = np.maximum(abslon, abslat)
([lon_loc], [lat_loc]) = np.where(c == np.min(c))
lon_grid = pr_nc.lon[lon_loc]
lat_grid = pr_nc.lat[lat_loc]

pr_grid_slice = pr_nc.sel(lon = lon_grid, lat=lat_grid,
                         time = slice('1850-01-01', '2099-12-31')).copy()
del(pr_nc)

Take a look at this data in R:

# map data:
print(py$pr_time_slice$pr)
## <xarray.DataArray 'pr' (lat: 64, lon: 128)>
## array([[3.386200e-06, 3.238463e-06, 3.016512e-06, ..., 3.645885e-06,
##         3.673917e-06, 3.301501e-06],
##        [5.245191e-06, 4.769758e-06, 4.283163e-06, ..., 6.483961e-06,
##         5.810207e-06, 5.658150e-06],
##        [3.322042e-06, 3.376860e-06, 3.215622e-06, ..., 6.009493e-06,
##         4.892620e-06, 4.685948e-06],
##        ...,
##        [1.786260e-05, 1.796068e-05, 1.756605e-05, ..., 3.057127e-05,
##         2.430753e-05, 1.891175e-05],
##        [1.662058e-05, 1.698199e-05, 1.603339e-05, ..., 2.208773e-05,
##         2.233014e-05, 1.938729e-05],
##        [1.454303e-05, 1.400501e-05, 1.383822e-05, ..., 1.702093e-05,
##         1.704664e-05, 1.702684e-05]], dtype=float32)
## Coordinates:
##     time     datetime64[ns] 2100-12-31
##   * lat      (lat) float64 -87.86 -85.1 -82.31 -79.53 ... 79.53 82.31 85.1 87.86
##   * lon      (lon) float64 0.0 2.812 5.625 8.438 ... 348.8 351.6 354.4 357.2
## Attributes:
##     units:            kg m-2 s-1
##     variable:         pr
##     experiment:       historical
##     ensemble:         r14i1p1f1
##     model:            CanESM5
##     stitching_id:     RCP  Standard-RCP-4.5~hectorUI1~1
##     recipe_location:  data/stitched_CanESM5_pr_RCP  Standard-RCP-4.5~hectorUI...
pr_val <- py$pr_time_slice$pr$values
pr_lon <- py$pr_time_slice$lon$values
pr_lat <- py$pr_time_slice$lat$values
grid <- expand.grid(list(lon=pr_lon,lat=pr_lat))
pr <- cbind(grid, 
             value=t(matrix(aperm(pr_val,c(2,1)), 1,length(pr_lat)*length(pr_lon))))

as.character(py$pr_time_slice$time) %>%
  substr(.,  37, 43) ->
  pr_time
  
ggplot(pr, aes(x = lon, y = lat, color=value, fill = value)) + geom_tile() +
  ggtitle(paste(pr_time, 'pr (kg m-2 s-1)'))

# grid time series data:
print(py$pr_grid_slice$pr)
## <xarray.DataArray 'pr' (time: 3000)>
## array([7.720855e-06, 1.992142e-06, 1.111988e-05, ..., 2.953856e-06,
##        8.631851e-06, 4.242951e-06], dtype=float32)
## Coordinates:
##   * time     (time) datetime64[ns] 1850-01-31 1850-02-28 ... 2099-12-31
##     lat      float64 37.67
##     lon      float64 255.9
## Attributes:
##     units:            kg m-2 s-1
##     variable:         pr
##     experiment:       historical
##     ensemble:         r14i1p1f1
##     model:            CanESM5
##     stitching_id:     RCP  Standard-RCP-4.5~hectorUI1~1
##     recipe_location:  data/stitched_CanESM5_pr_RCP  Standard-RCP-4.5~hectorUI...
cp_pr <- data.frame(value = py$pr_grid_slice$pr$values)

data.frame(time = as.character(py$pr_grid_slice$time$values)) %>%
  separate(time, into = c('year', 'month', 'day'), sep = '-') %>%
  select(-day) %>%
  mutate(year = as.integer(year),
         month = as.integer(month),
         time_index = as.integer(row.names(.))) ->
  cp_time

ggplot(cbind(cp_time, cp_pr), aes(x = time_index, y = value)) + geom_line() +
  ggtitle('College Park monthly pr (kg m-2 s-1), 1850-2100') 

ggplot(cbind(cp_time, cp_pr) %>% filter(year >= 2021), aes(x = time_index, y = value)) + geom_line() +
  ggtitle('College Park monthly pr (kg m-2 s-1), 2021-2100') 

hurs

py_files = pd.DataFrame(r.files, columns = ['file_name'])

# load the netcdf:
f = py_files.loc[py_files['file_name'].str.contains('hurs')].values[0]
hurs_nc = xr.open_dataset(f[0])

# time slice map:
hurs_time_slice = hurs_nc.sel(time = r.map_time).copy()


# time series for single grid cell:
# lon and lat values for a grid cell near College Park, MD, home of JGCRI:
cp_lat = 38.9897
cp_lon = 180 + 76.9378 # this is probably not actually right 

# lat and lon coordinates closest
abslat = np.abs(hurs_nc.lat - cp_lat)
abslon = np.abs(hurs_nc.lon-cp_lon)
c = np.maximum(abslon, abslat)
([lon_loc], [lat_loc]) = np.where(c == np.min(c))
lon_grid = hurs_nc.lon[lon_loc]
lat_grid = hurs_nc.lat[lat_loc]

hurs_grid_slice = hurs_nc.sel(lon = lon_grid, lat=lat_grid,
                         time = slice('1850-01-01', '2099-12-31')).copy()
del(hurs_nc)

Take a look at this data in R:

# map data:
print(py$hurs_time_slice$hurs)
## <xarray.DataArray 'hurs' (lat: 64, lon: 128)>
## array([[89.245094, 89.51959 , 89.67603 , ..., 88.926735, 89.13366 , 89.19289 ],
##        [90.81301 , 90.860435, 90.839966, ..., 90.29955 , 90.63197 , 90.74781 ],
##        [91.859375, 92.097946, 92.269356, ..., 91.17945 , 91.41972 , 91.63859 ],
##        ...,
##        [95.520386, 94.799515, 94.592995, ..., 96.2515  , 96.32956 , 96.23767 ],
##        [95.91351 , 95.84933 , 95.68604 , ..., 96.05101 , 96.09133 , 96.04043 ],
##        [96.13935 , 96.15942 , 96.17313 , ..., 96.30332 , 96.311195, 96.29911 ]],
##       dtype=float32)
## Coordinates:
##     time     datetime64[ns] 2100-12-31
##   * lat      (lat) float64 -87.86 -85.1 -82.31 -79.53 ... 79.53 82.31 85.1 87.86
##   * lon      (lon) float64 0.0 2.812 5.625 8.438 ... 348.8 351.6 354.4 357.2
## Attributes:
##     units:            %
##     variable:         hurs
##     experiment:       historical
##     ensemble:         r14i1p1f1
##     model:            CanESM5
##     stitching_id:     RCP  Standard-RCP-4.5~hectorUI1~1
##     recipe_location:  data/stitched_CanESM5_hurs_RCP  Standard-RCP-4.5~hector...
hurs_val <- py$hurs_time_slice$hurs$values
hurs_lon <- py$hurs_time_slice$lon$values
hurs_lat <- py$hurs_time_slice$lat$values
grid <- expand.grid(list(lon=pr_lon,lat=pr_lat))
hurs <- cbind(grid, 
             value=t(matrix(aperm(hurs_val,c(2,1)), 1,length(hurs_lat)*length(hurs_lon))))

as.character(py$hurs_time_slice$time) %>%
  substr(.,  37, 43) ->
  hurs_time
  
ggplot(hurs, aes(x = lon, y = lat, color=value, fill = value)) + geom_tile() +
  ggtitle(paste(hurs_time, 'hurs (%)'))

# grid time series data:
print(py$hurs_grid_slice$hurs)
## <xarray.DataArray 'hurs' (time: 3000)>
## array([69.31585 , 58.68924 , 70.73456 , ..., 33.22224 , 49.70032 , 50.694588],
##       dtype=float32)
## Coordinates:
##   * time     (time) datetime64[ns] 1850-01-31 1850-02-28 ... 2099-12-31
##     lat      float64 37.67
##     lon      float64 255.9
## Attributes:
##     units:            %
##     variable:         hurs
##     experiment:       historical
##     ensemble:         r14i1p1f1
##     model:            CanESM5
##     stitching_id:     RCP  Standard-RCP-4.5~hectorUI1~1
##     recipe_location:  data/stitched_CanESM5_hurs_RCP  Standard-RCP-4.5~hector...
cp_hurs <- data.frame(value = py$hurs_grid_slice$hurs$values)

data.frame(time = as.character(py$hurs_grid_slice$time$values)) %>%
  separate(time, into = c('year', 'month', 'day'), sep = '-') %>%
  select(-day) %>%
  mutate(year = as.integer(year),
         month = as.integer(month),
         time_index = as.integer(row.names(.))) ->
  cp_time

ggplot(cbind(cp_time, cp_hurs), aes(x = time_index, y = value)) + geom_line() +
  ggtitle('College Park monthly hurs (%), 1850-2100') 

ggplot(cbind(cp_time, cp_hurs) %>% filter(year >= 2021), aes(x = time_index, y = value)) + geom_line() +
  ggtitle('College Park monthly hurs (%), 2021-2100') 

psl

sea level pressure can’t have a CP time series

py_files = pd.DataFrame(r.files, columns = ['file_name'])

# load the netcdf:
f = py_files.loc[py_files['file_name'].str.contains('psl')].values[0]
psl_nc = xr.open_dataset(f[0])

# time slice map:
psl_time_slice = psl_nc.sel(time = r.map_time).copy()
del(psl_nc)

Take a look at this data in R:

# map data:
print(py$psl_time_slice$psl)
## <xarray.DataArray 'psl' (lat: 64, lon: 128)>
## array([[ 98851.45 ,  98860.33 ,  98870.234, ...,  98832.06 ,  98837.234,
##          98843.73 ],
##        [ 98777.375,  98818.11 ,  98858.99 , ...,  98667.055,  98700.68 ,
##          98737.84 ],
##        [ 98885.29 ,  98927.47 ,  98962.516, ...,  98738.74 ,  98787.6  ,
##          98837.7  ],
##        ...,
##        [100282.2  , 100281.23 , 100289.21 , ..., 100350.28 , 100316.06 ,
##         100293.5  ],
##        [100591.66 , 100602.336, 100616.016, ..., 100580.99 , 100580.75 ,
##         100584.36 ],
##        [100882.72 , 100889.836, 100897.44 , ..., 100864.52 , 100870.03 ,
##         100876.1  ]], dtype=float32)
## Coordinates:
##     time     datetime64[ns] 2100-12-31
##   * lat      (lat) float64 -87.86 -85.1 -82.31 -79.53 ... 79.53 82.31 85.1 87.86
##   * lon      (lon) float64 0.0 2.812 5.625 8.438 ... 348.8 351.6 354.4 357.2
## Attributes:
##     units:            Pa
##     variable:         psl
##     experiment:       historical
##     ensemble:         r14i1p1f1
##     model:            CanESM5
##     stitching_id:     RCP  Standard-RCP-4.5~hectorUI1~1
##     recipe_location:  data/stitched_CanESM5_psl_RCP  Standard-RCP-4.5~hectorU...
psl_val <- py$psl_time_slice$psl$values
psl_lon <- py$psl_time_slice$lon$values
psl_lat <- py$psl_time_slice$lat$values
grid <- expand.grid(list(lon=psl_lon,lat=psl_lat))
psl <- cbind(grid, 
             value=t(matrix(aperm(psl_val,c(2,1)), 1,length(psl_lat)*length(psl_lon))))

as.character(py$psl_time_slice$time) %>%
  substr(.,  37, 43) ->
  pr_time
  
ggplot(psl, aes(x = lon, y = lat, color=value, fill = value)) + geom_tile() +
  ggtitle(paste(pr_time, 'psl (Pa)'))